1 Purpose

The purpose of this analysis is to estimate the growth rates of bacterial species in the cecal microbiomes of turkey’s fed various amounts of BMD (bacitracin methylene disalicylate)

1.1 How to estimate bacterial growth rate

Taken from “Measurement of bacterial replication rates in microbial communities”
Christopher T Brown, Matthew R Olm, Brian C Thomas & Jillian F Banfield
Nature Biotechnology volume34, pages1256–1263 (2016)

Dividing cells in a natural population contain, on average, more than one copy of their genome. In an unsynchronized population of growing bacteria, cells contain genomes that are replicated to different extents, resulting in a gradual reduction in the average genome copy number from the origin to the terminus of replication. This decrease can be detected by measuring changes in DNA sequencing coverage across complete genomes. Bacterial genome replication proceeds bi-directionally from a single origin of replication, therefore the origin and terminus of replication can be deduced based on this coverage pattern. GC skew and genome coverage analyses of a wide variety of bacteria have shown that this replication mechanism is broadly applicable. Further, early studies of bacterial cultures revealed that cells can achieve faster division by simultaneously initiating multiple rounds of genome replication, which results in an average of more than two genome copies in rapidly growing cells.

iRep_explain

iRep_explain

(a) Populations of bacteria undergoing rapid cell division differ from slowly growing populations in that the individual cells of a growing population are more actively in the process of replicating their genomes (purple circles). (b) Differences in genome copy number across a population of replicating cells can be determined based on sequencing read coverage over complete genome sequences. The ratio between the coverage at the origin (“peak”) and terminus (“trough”) of replication (PTR) relates to the replication rate of the population. The origin and terminus can be determined based on cumulative GC skew. (c,d) If no complete genome sequence is available, it is possible to calculate the replication rate based on the distribution of coverage values across a draft-quality genome using the iRep method. Coverage is first calculated across overlapping segments of genome fragments. Growing populations will have a wider distribution of coverage values compared with stable populations (histograms). These values are ordered from lowest to highest, and linear regression is used to evaluate the coverage distribution across the genome in order to determine the coverage values associated with the origin and terminus of replication. iRep is calculated as the ratio of these values. (e) Genome-resolved metagenomics involves DNA extraction from a microbiome sample followed by DNA sequencing, assembly, and genome binning. Binning is the grouping together of assembled genome fragments that originated from the same genome. This can be done based on shared characteristics of each fragment, such as sequence composition, taxonomic affiliation, or abundance.

2 General workflow

  1. Assemble shotgun metagenome
  2. Bin metagenomic assembly into discrete genomes
  3. Calculate coverage patterns in these bins
  4. Infer growth rate from coverage patterns

2.1 Experimental design

exp_design

exp_design

  • Collected cecal contents at necropsy
  • 10 birds per treatment per timepoint = 90 samples total
  • No repeated measurements, (cecal contents are a terminal sample)
  • bulk DNA extraction

2.2 Data Description

  • HiSeq 2x150 PE reads
    1 library per bird = 90 libraries
    2,480,187,759 paired reads
    avg insert = 280 bp

  • PacBio RSII reads
    library preps from pooled samples
    1 library prep per timepoint
    equal pool of all treatments
    3 PacBio RSII library preps
    3 SMRT cells?
    7,178,582 total reads Not as long as I was expecting

2.3 Metagenomic Assembly

  • One large co-assembly with all data combined.
  • bbtools used to QC and normalize reads
  • Metaspades used for hybrid assembly
    Standard short-read assembly, then long reads aligned to assembly graph

2.3.1 Read QC

  • The script ALL_QC.SLURM describes the pre-processing steps for the hiseq reads
    1. remove optical duplicates
    2. remove low quality regions on the flow cell
    3. trim adapters
    4. remove artifacts (phiX etc.)
    5. remove turkey reads
    6. error correct
    7. normalize
#!/bin/bash

#SBATCH --job-name="ALL_QC"                            # name of the job submitted
#SBATCH -p mem                                         # name of the queue you are submitting to
#SBATCH -n 40                                          # number of cores/tasks in this job, you get all 20 cores with 2 threads per core with hyperthreading
#SBATCH -N 1                                           # nodes
#SBATCH --mem=1000G                                    # memory allocation       
#SBATCH -t 72:00:00                                    # time allocated for this job hours:mins:seconds
#SBATCH --mail-user=julestrachsel@gmail.com            # will receive an email when job starts, ends or fails
#SBATCH --mail-type=BEGIN,END,FAIL                     # will receive an email when job starts, ends or fails
#SBATCH -o "stdout.%j.%N"                              # standard out %j adds job number to outputfile name and %N adds the node name
#SBATCH -e "stderr.%j.%N"                              # optional but it prints our standard error

# ENTER COMMANDS HERE:

module load java/1.8.0_121

set -e # I think this means if anything fails it immediately exits and doesnt keep plowing ahead.

#Written by Brian Bushnell
#Last updated March 4, 2019

#This script is designed to preprocess data for assembly of overlapping 2x150bp reads from Illumina HiSeq 2500.
#Some numbers and steps may need adjustment for different data types or file paths.
#For large genomes, tadpole and bbmerge (during the "Merge" phase) may need the flag "prefilter=2" to avoid running out of memory.
#"prefilter" makes these take twice as long though so don't use it if you have enough memory.
#The "rm ALL_temp.fq.gz; ln -s reads.fq.gz ALL_temp.fq.gz" is not necessary but added so that any pipeline stage can be easily disabled,
#without affecting the input file name of the next stage.

#interleave reads
#reformat.sh in=ALL_R#.fastq.gz out=ALL.fq.gz

# 92 fecal shotgun metagenomic libraries
# sequenced on HiSeq3000 2x150 PE
# I think avg insert sizes is ~280?

# this takes the R1 and R2 files and creates interleaved fastq.gz files for each
for x in *_R1.fastq.gz 
do
#echo "${x%_R1*}_R2.fastq.gz"
reformat.sh in1="$x" in2="${x%_R1*}_R2.fastq.gz" out="${x%_R1*}.fq.gz"
done

# combines all interleaved reads into one file
cat *.fq.gz > ALL.fq.gz

#Link the interleaved input file as "ALL_temp.fq.gz"
ln -s ALL.fq.gz ALL_temp.fq.gz

# --- Preprocessing ---

#Remove optical duplicates
clumpify.sh in=ALL_temp.fq.gz out=ALL.clumped.fq.gz dedupe optical
rm ALL_temp.fq.gz; ln -s ALL.clumped.fq.gz ALL_temp.fq.gz

#Remove low-quality regions
filterbytile.sh in=ALL_temp.fq.gz out=ALL.filtered_by_tile.fq.gz
rm ALL_temp.fq.gz; ln -s ALL.filtered_by_tile.fq.gz ALL_temp.fq.gz

#Trim adapters.  Optionally, reads with Ns can be discarded by adding "maxns=0" and reads with really low average quality can be discarded with "maq=8".
bbduk.sh in=ALL_temp.fq.gz out=ALL.trimmed.fq.gz ktrim=r k=23 mink=11 hdist=1 tbo tpe minlen=70 ref=adapters ftm=5 ordered
rm ALL_temp.fq.gz; ln -s ALL.trimmed.fq.gz ALL_temp.fq.gz

#Remove synthetic artifacts and spike-ins by kmer-matching.
bbduk.sh in=ALL_temp.fq.gz out=ALL.filtered.fq.gz k=31 ref=artifacts,phix ordered cardinality
rm ALL_temp.fq.gz; ln -s ALL.filtered.fq.gz ALL_temp.fq.gz

#Decontamination by mapping can be done here.
#JGI removes these in two phases:
#1) common microbial contaminants (E.coli, Pseudomonas, Delftia, others)
#2) common animal contaminants (Human, cat, dog, mouse)

bbsplit.sh in=SAMPLE_temp.fq.gz ref=turkey.fa basename=SAMPLE_out_%.fq.gz outu=SAMPLE_clean.fq.gz int=t
rm SAMPLE_temp.fq.gz; ln -s SAMPLE_clean.fq.gz SAMPLE_temp.fq.gz


#Error-correct phase 1
bbmerge.sh in=ALL_temp.fq.gz out=ALL.ecco.fq.gz ecco mix vstrict ordered ihist=ihist_merge1.txt
rm ALL_temp.fq.gz; ln -s ALL.ecco.fq.gz ALL_temp.fq.gz

#Error-correct phase 2
clumpify.sh in=ALL_temp.fq.gz out=ALL.eccc.fq.gz ecc passes=4 reorder
rm ALL_temp.fq.gz; ln -s ALL.eccc.fq.gz ALL_temp.fq.gz

#Error-correct phase 3
#Low-depth reads can be discarded here with the "tossjunk", "tossdepth", or "tossuncorrectable" flags.
#For very large datasets, "prefilter=1" or "prefilter=2" can be added to conserve memory.
#Alternatively, bbcms.sh can be used if Tadpole still runs out of memory.
tadpole.sh in=ALL_temp.fq.gz out=ALL.ecct.fq.gz ecc k=62 ordered tossjunk prefilter=1
rm ALL_temp.fq.gz; ln -s ALL.ecct.fq.gz ALL_temp.fq.gz

#Normalize
#This phase can be very beneficial for data with uneven coverage like metagenomes, MDA-amplified single cells, and RNA-seq, but is not usually recommended for isolate DNA.
#So normally, this stage should be commented out, as it is here.
bbnorm.sh in=ALL_temp.fq.gz out=normalized.fq.gz target=100 hist=khist.txt peaks=peaks.txt
rm ALL_temp.fq.gz; ln -s normalized.fq.gz ALL_temp.fq.gz
    

2.3.3 Assembly statistics



  • Comparison of full assemblies
  • contigs less than 500 bp removed
attribute Megahit Metaspades
Main genome scaffold total: 1,486,743 1,370,062
Main genome contig total: 1,486,743 1,390,561
Main genome scaffold sequence total: 3849 MB 3637 MB
Main genome contig sequence total: 3849 MB 3636 MB
Main genome scaffold N/L50: 114231 / 6624 103354 / 6047
Main genome contig N/L50: 114231 / 6624 109063 / 5812
Main genome scaffold N/L90: 152813 / 5065 172075 / 3856
Main genome contig N/L90: 152813 / 5065 180267 / 3747
Main genome assembly score: 18.019 19.793
Max scaffold length: 1.063 MB 1.137 MB
Max contig length: 1.063 MB 1.063 MB
Number of scaffolds > 50 KB: 3,899 5,000
Number of bases in scaffolds>50 KB: 347 MB 482 MB
% main genome in scaffolds > 50 KB: 9.04% 13.26%
  • Comparisons at different minimum contig length thresholds
Number of Contigs
Length of All Contigs
min scaffold length Megahit Metaspades Megahit Metaspades
500 1,486,743 1,370,062 3849 MB 3637 MB
1000 723,321 744,575 3325 MB 3197 MB
2500 306,489 281,246 2686 MB 2483 MB
5000 154,925 128,562 2158 MB 1956 MB
10000 69,333 56,088 1560 MB 1454 MB
25000 15,980 15,669 754 MB 845 MB
50000 3,899 5,000 347 MB 482 MB
100000 942 1,427 151 MB 240 MB
250000 76 162 27 MB 58 MB
500000 8 18 5.5 MB 12.9 MB
1000000 1 3 1.1 MB 3.3 MB
  • What proportion of reads map to the assembly?

2.4 Genome binning using Metabat2

  • Basic QC on reads
  • Map reads from each sample to Metaspades co-assembly
  • input indexed bam files to metabat2 for binning

2.4.1 Read QC

  • This is a template I used to QC the reads prior to mapping:
  • script is named ‘pre_mapping_QC.SLURM’
#!/bin/bash

#SBATCH --job-name="SAMPLE"                            # name of the job submitted
#SBATCH -p brief-low                                   # name of the queue you are submitting to
#SBATCH -n 30                                          # number of cores/tasks in this job, you get all 20 cores with 2 threads per core with hyperthreading
#SBATCH -N 1                                           # number of nodes
#SBATCH --mem=300G                                     # memory allocation 
#SBATCH -t 2:00:00                                     # time allocated for this job hours:mins:seconds
#SBATCH --mail-user=YOUREMAILHERE@email.com            # will receive an email when job starts, ends or fails
#SBATCH --mail-type=BEGIN,END,FAIL                     # will receive an email when job starts, ends or fails
#SBATCH -o "stdout.%j.%N"                              # standard out %j adds job number to outputfile name and %N adds the node name
#SBATCH -e "stderr.%j.%N"                              # optional but it prints our standard error

# ENTER COMMANDS HERE:

module load java/1.8.0_121

set -e # I think this means if anything fails it immediately exits and doesnt keep plowing ahead.

#Written by Brian Bushnell
#Last updated March 4, 2019

#This script is designed to preprocess data for assembly of overlapping 2x150bp reads from Illumina HiSeq 2500.
#Some numbers and steps may need adjustment for different data types or file paths.
#For large genomes, tadpole and bbmerge (during the "Merge" phase) may need the flag "prefilter=2" to avoid running out of memory.
#"prefilter" makes these take twice as long though so don't use it if you have enough memory.
#The "rm SAMPLE_temp.fq.gz; ln -s reads.fq.gz SAMPLE_temp.fq.gz" is not necessary but added so that any pipeline stage can be easily disabled,
#without affecting the input file name of the next stage.

#interleave reads
reformat.sh in=SAMPLE_R#.fastq.gz out=SAMPLE.fq.gz

#Link the interleaved input file as "SAMPLE_temp.fq.gz"
ln -s SAMPLE.fq.gz SAMPLE_temp.fq.gz

# --- Preprocessing ---

#Remove optical duplicates
clumpify.sh in=SAMPLE_temp.fq.gz out=SAMPLE.clumped.fq.gz dedupe optical
rm SAMPLE_temp.fq.gz; ln -s SAMPLE.clumped.fq.gz SAMPLE_temp.fq.gz

#Remove low-quality regions
filterbytile.sh in=SAMPLE_temp.fq.gz out=SAMPLE.filtered_by_tile.fq.gz
rm SAMPLE_temp.fq.gz; ln -s SAMPLE.filtered_by_tile.fq.gz SAMPLE_temp.fq.gz

#Trim adapters.  Optionally, reads with Ns can be discarded by adding "maxns=0" and reads with really low average quality can be discarded with "maq=8".
bbduk.sh in=SAMPLE_temp.fq.gz out=SAMPLE.trimmed.fq.gz ktrim=r k=23 mink=11 hdist=1 tbo tpe minlen=70 ref=adapters ftm=5 ordered
rm SAMPLE_temp.fq.gz; ln -s SAMPLE.trimmed.fq.gz SAMPLE_temp.fq.gz

#Remove synthetic artifacts and spike-ins by kmer-matching.
bbduk.sh in=SAMPLE_temp.fq.gz out=SAMPLE.filtered.fq.gz k=31 ref=artifacts,phix ordered cardinality
rm SAMPLE_temp.fq.gz; ln -s SAMPLE.filtered.fq.gz SAMPLE_temp.fq.gz
  • This script is just a template, I used it to create a SLURM script for each sample using a loop like this:

2.4.2 Map reads to Assembly

  • this script is called metagenome_map4bin.SLURM
  • Again, this script is just a template, I used it to create a SLURM script for each sample using a loop like the one previously mentioned.

  • This script will output sams as well as indexed bams for each sample mapping to the co-assembly.

2.4.5 recalculate checkM completeness metrics

  • This script runs checkM on the length filtered bins
  • should be run from a directory containing the length filtered bin fastas
  • Named ‘checkm.SLURM’ and is available in the SLURM directory
  • Now that the completness metrics are re-calculated, we need to pull out the bins that meet the requirements of iRep.
  • This chunk identifies genomes that are more than 75% complete and have less than 2.5% contamination (column 13 and 14 in the checkm output).
  • genomes meeting these requirements are then moved to their own ‘good_bins’ directory.

2.5 Calculate growth rate for good bins

  • iRep is not installed system wide on CERES, but it can be easily installed using conda.
  • Most of the work is done, we can actually use the sam files generated prior to metabat to calculate the coverage patterns of the good bins.
  • The following script is a template for running iRep. It is named iREP7_template.SLURM and is available in the SLURM directory.
  • You can generate SLURM scripts for each sample using the same kind of loop that was discussed earlier.
  • Once all these jobs complete we will compile all the results into one file.
  • Once this job finishes we are ready to move on to the real action.

3 Data analysis

3.1 First look

  • How much data do we have?
  • Are there issues with it?
  • Anything suspicious?
## Warning: Missing column names filled in: 'X1' [1]
## Warning: 52298 parsing failures.
## row           col               expected actual                       file
##   1 iRep          a number                  n/a 'ALL_RESULTS_BULK_MAP.txt'
##   1 fragments/Mbp no trailing characters    .0  'ALL_RESULTS_BULK_MAP.txt'
##   2 iRep          a number                  n/a 'ALL_RESULTS_BULK_MAP.txt'
##   2 fragments/Mbp no trailing characters    .0  'ALL_RESULTS_BULK_MAP.txt'
##   3 iRep          a number                  n/a 'ALL_RESULTS_BULK_MAP.txt'
## ... ............. ...................... ...... ..........................
## See problems(...) for more details.
## Warning: NAs introduced by coercion

## 
##  Pearson's product-moment correlation
## 
## data:  dat$iRep and log(dat$`relative abundance`)
## t = -8.492, df = 2276, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2147699 -0.1351505
## sample estimates:
##        cor 
## -0.1752467

## [1] 2278
## [1] 244

## # A tibble: 449 x 8
##    bin_day     day   genome    ctrl   sub  ther num_NA num_w_4p
##    <chr>       <chr> <chr>    <int> <int> <int>  <int>    <int>
##  1 bin.1004_07 07    bin.1004    NA    NA     2      2        0
##  2 bin.1007_07 07    bin.1007    NA    NA     1      2        0
##  3 bin.1011_07 07    bin.1011     1    NA    NA      2        0
##  4 bin.1012_07 07    bin.1012     2    NA     6      1        1
##  5 bin.1016_07 07    bin.1016     2     2     3      0        0
##  6 bin.102_07  07    bin.102      1    NA     3      1        0
##  7 bin.103_07  07    bin.103      3     8     4      0        2
##  8 bin.1031_07 07    bin.1031     4     7     5      0        3
##  9 bin.1042_07 07    bin.1042     1    NA    NA      2        0
## 10 bin.1044_07 07    bin.1044     9    NA    NA      2        1
## # … with 439 more rows
## # A tibble: 138 x 8
##    bin_day     day   genome    ctrl   sub  ther num_NA num_w_4p
##    <chr>       <chr> <chr>    <int> <int> <int>  <int>    <int>
##  1 bin.1012_07 07    bin.1012     2    NA     6      1        1
##  2 bin.103_07  07    bin.103      3     8     4      0        2
##  3 bin.1031_07 07    bin.1031     4     7     5      0        3
##  4 bin.1044_07 07    bin.1044     9    NA    NA      2        1
##  5 bin.110_07  07    bin.110      4     3     3      0        1
##  6 bin.127_07  07    bin.127      4     5     2      0        2
##  7 bin.175_07  07    bin.175      3     5     7      0        2
##  8 bin.205_07  07    bin.205      3     6     6      0        2
##  9 bin.209_07  07    bin.209      2     4     3      0        1
## 10 bin.231_07  07    bin.231      6     4     5      0        3
## # … with 128 more rows
## # A tibble: 70 x 8
##    bin_day     day   genome    ctrl   sub  ther num_NA num_w_4p
##    <chr>       <chr> <chr>    <int> <int> <int>  <int>    <int>
##  1 bin.103_07  07    bin.103      3     8     4      0        2
##  2 bin.1031_07 07    bin.1031     4     7     5      0        3
##  3 bin.127_07  07    bin.127      4     5     2      0        2
##  4 bin.175_07  07    bin.175      3     5     7      0        2
##  5 bin.205_07  07    bin.205      3     6     6      0        2
##  6 bin.231_07  07    bin.231      6     4     5      0        3
##  7 bin.235_07  07    bin.235      8    10     8      0        3
##  8 bin.280_07  07    bin.280      5     8     7      0        3
##  9 bin.313_07  07    bin.313      3     8     5      0        2
## 10 bin.316_07  07    bin.316      7     5     7      0        3
## # … with 60 more rows
## # A tibble: 27 x 8
##    bin_day     day   genome    ctrl   sub  ther num_NA num_w_4p
##    <chr>       <chr> <chr>    <int> <int> <int>  <int>    <int>
##  1 bin.1031_07 07    bin.1031     4     7     5      0        3
##  2 bin.231_07  07    bin.231      6     4     5      0        3
##  3 bin.235_07  07    bin.235      8    10     8      0        3
##  4 bin.280_07  07    bin.280      5     8     7      0        3
##  5 bin.316_07  07    bin.316      7     5     7      0        3
##  6 bin.433_07  07    bin.433      5     5     6      0        3
##  7 bin.493_07  07    bin.493      5     4     5      0        3
##  8 bin.886_07  07    bin.886      6     8     9      0        3
##  9 bin.205_35  35    bin.205      4     4     5      0        3
## 10 bin.235_35  35    bin.235      9     4     5      0        3
## # … with 17 more rows
## # A tibble: 244 x 8
##    genome   `07`  `35`  `78` all_times d7g   d35g  d78g 
##    <chr>   <int> <int> <int>     <int> <lgl> <lgl> <lgl>
##  1 bin.316     3     3     2         8 TRUE  TRUE  FALSE
##  2 bin.280     3     3     1         7 TRUE  TRUE  FALSE
##  3 bin.716     1     3     3         7 FALSE TRUE  TRUE 
##  4 bin.235     3     3     0         6 TRUE  TRUE  FALSE
##  5 bin.796     1     3     2         6 FALSE TRUE  FALSE
##  6 bin.205     2     3     0         5 FALSE TRUE  FALSE
##  7 bin.306     0     2     3         5 FALSE FALSE TRUE 
##  8 bin.735     2     2     1         5 FALSE FALSE FALSE
##  9 bin.886     3     1     1         5 TRUE  FALSE FALSE
## 10 bin.988     1     3     1         5 FALSE TRUE  FALSE
## # … with 234 more rows
## # A tibble: 8 x 4
##   genome    `07`  `35`  `78`
##   <chr>    <int> <int> <int>
## 1 bin.1031     3    NA    NA
## 2 bin.231      3    NA    NA
## 3 bin.235      3     3    NA
## 4 bin.280      3     3    NA
## 5 bin.316      3     3    NA
## 6 bin.433      3    NA    NA
## 7 bin.493      3     3    NA
## 8 bin.886      3    NA    NA
## # A tibble: 11 x 4
##    genome   `07`  `35`  `78`
##    <chr>   <int> <int> <int>
##  1 bin.205    NA     3    NA
##  2 bin.235     3     3    NA
##  3 bin.280     3     3    NA
##  4 bin.316     3     3    NA
##  5 bin.493     3     3    NA
##  6 bin.550    NA     3    NA
##  7 bin.642    NA     3    NA
##  8 bin.716    NA     3     3
##  9 bin.796    NA     3    NA
## 10 bin.826    NA     3    NA
## 11 bin.988    NA     3    NA
## # A tibble: 8 x 4
##   genome   `07`  `35`  `78`
##   <chr>   <int> <int> <int>
## 1 bin.306    NA    NA     3
## 2 bin.389    NA    NA     3
## 3 bin.55     NA    NA     3
## 4 bin.623    NA    NA     3
## 5 bin.693    NA    NA     3
## 6 bin.698    NA    NA     3
## 7 bin.716    NA     3     3
## 8 bin.982    NA    NA     3
## # A tibble: 4 x 5
##   genome   `07`  `35`  `78` d7vd35
##   <chr>   <int> <int> <int>  <int>
## 1 bin.235     3     3    NA      6
## 2 bin.280     3     3    NA      6
## 3 bin.316     3     3    NA      6
## 4 bin.493     3     3    NA      6
## # A tibble: 1 x 5
##   genome   `07`  `35`  `78` d35vd78
##   <chr>   <int> <int> <int>   <int>
## 1 bin.716    NA     3     3       6
## # A tibble: 0 x 5
## # … with 5 variables: genome <chr>, `07` <int>, `35` <int>, `78` <int>,
## #   d07vd78 <int>
## # A tibble: 15 x 4
##    genome   ctrl_07 ctrl_35 ctrl_78
##    <chr>      <int>   <int>   <int>
##  1 bin.1044       9      10      NA
##  2 bin.127        4       5      NA
##  3 bin.231        6       5      NA
##  4 bin.235        8       9       2
##  5 bin.280        5       8      NA
##  6 bin.285        4       4       1
##  7 bin.316        7       9      NA
##  8 bin.322        7       8      NA
##  9 bin.459        5       7       1
## 10 bin.493        5       6      NA
## 11 bin.716        6      10       4
## 12 bin.735        6       9       2
## 13 bin.810        5       5      NA
## 14 bin.820        9       4      NA
## 15 bin.988        9       7      NA
## # A tibble: 8 x 4
##   genome  sub_07 sub_35 sub_78
##   <chr>    <int>  <int>  <int>
## 1 bin.205      6      4     NA
## 2 bin.235     10      4      2
## 3 bin.280      8      7      1
## 4 bin.316      5      8      4
## 5 bin.424      5      4     NA
## 6 bin.493      4      5     NA
## 7 bin.560      5      8      2
## 8 bin.735      4      7      2
## # A tibble: 11 x 4
##    genome   ther_07 ther_35 ther_78
##    <chr>      <int>   <int>   <int>
##  1 bin.1012       6       4      NA
##  2 bin.103        4       6       3
##  3 bin.205        6       5      NA
##  4 bin.231        5       6      NA
##  5 bin.235        8       5       1
##  6 bin.280        7       7       5
##  7 bin.316        7      10       9
##  8 bin.424        6       5      NA
##  9 bin.493        5       4      NA
## 10 bin.796        4       5       4
## 11 bin.886        9       7       1
## [1] 1572
## [1] 102

## [1] 1062
## [1] 56

## [1] 507
## [1] 22



  • Well, the data aren’t idea, but we’ll try to get something meaningful out of it.

  • Specifically I am interested if there is statistical evidence for these two hypothesis:
    1. growth rates are supressed in the abx treatments relative to the controls
    2. growth rates are lower in the d35 communities relative to the D7 communities

3.2 Treatment Effects

  • Only trying to asses treatment effect within each day.

## # A tibble: 6 x 8
## # Groups:   genome [5]
##   genome  term     comparison estimate conf.low conf.high tuk_pval fdr.pval
##   <chr>   <chr>    <chr>         <dbl>    <dbl>     <dbl>    <dbl>    <dbl>
## 1 bin.235 treatme… ther-ctrl    -0.186   -0.353   -0.0196 0.0265    0.0795 
## 2 bin.886 treatme… ther-ctrl    -0.272   -0.477   -0.0673 0.00837   0.0251 
## 3 bin.280 treatme… ther-ctrl    -0.278   -0.541   -0.0160 0.0367    0.110  
## 4 bin.316 treatme… ther-sub     -0.162   -0.307   -0.0165 0.0282    0.0845 
## 5 bin.231 treatme… sub-ctrl     -0.272   -0.459   -0.0843 0.00585   0.00877
## 6 bin.231 treatme… ther-ctrl    -0.343   -0.519   -0.168  0.000589  0.00177

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: iRep ~ treatment + (1 | genome)
##    Data: D7_treat_comp
## 
## REML criterion at convergence: -74.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4914 -0.5265 -0.0456  0.5822  3.7026 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  genome   (Intercept) 0.03276  0.1810  
##  Residual             0.02796  0.1672  
## Number of obs: 149, groups:  genome, 8
## 
## Fixed effects:
##                Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)     1.97145    0.06863   8.41736  28.725 1.07e-09 ***
## treatmentsub   -0.06411    0.03429 139.22518  -1.870   0.0636 .  
## treatmentther  -0.20802    0.03394 139.06323  -6.129 8.59e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtmnts
## treatmentsb -0.260        
## treatmntthr -0.262  0.530
## # A tibble: 5 x 4
##   bin     marker_lineage               Completeness Contamination
##   <chr>   <chr>                               <dbl>         <dbl>
## 1 bin.886 k__Bacteria_(UID2372)                97.2          0.94
## 2 bin.231 o__Clostridiales_(UID1212)           91.6          1.59
## 3 bin.280 o__Clostridiales_(UID1226)           89.0          0.63
## 4 bin.316 o__Clostridiales_(UID1212)           87.5          0.48
## 5 bin.235 f__Lachnospiraceae_(UID1256)         87.4          0.97

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: iRep ~ treatment + (1 | genome)
##    Data: D35_treat_comp
## 
## REML criterion at convergence: -281.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0364 -0.6657 -0.0746  0.5399  3.3851 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  genome   (Intercept) 0.03565  0.1888  
##  Residual             0.01171  0.1082  
## Number of obs: 211, groups:  genome, 11
## 
## Fixed effects:
##                 Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)     1.693221   0.058269  10.616555  29.059 1.83e-11 ***
## treatmentsub   -0.005495   0.018189 198.141196  -0.302    0.763    
## treatmentther  -0.002010   0.018208 198.108917  -0.110    0.912    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtmnts
## treatmentsb -0.144        
## treatmntthr -0.143  0.464

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: iRep ~ treatment + (1 | genome)
##    Data: D78_treat_comp
## 
## REML criterion at convergence: -198.2
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.40673 -0.65882 -0.06031  0.68226  2.45716 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  genome   (Intercept) 0.01039  0.1019  
##  Residual             0.01189  0.1090  
## Number of obs: 147, groups:  genome, 8
## 
## Fixed effects:
##                Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)     1.64378    0.03964   9.03716  41.466 1.27e-11 ***
## treatmentsub   -0.07263    0.02256 137.22989  -3.220   0.0016 ** 
## treatmentther  -0.05829    0.02246 137.41201  -2.595   0.0105 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trtmnts
## treatmentsb -0.302        
## treatmntthr -0.305  0.528


3.3 Time Effect

  • Trying to determine if there is a difference in growth rates between D7 and D35 in each diet in isolation.

## # A tibble: 5 x 8
## # Groups:   genome [5]
##   genome   term  comparison estimate conf.low conf.high tuk_pval fdr.pval
##   <chr>    <chr> <chr>         <dbl>    <dbl>     <dbl>    <dbl>    <dbl>
## 1 bin.1044 day   35-07        -0.370   -0.538   -0.202  0.000234 0.000234
## 2 bin.235  day   35-07        -0.172   -0.295   -0.0493 0.00919  0.00919 
## 3 bin.735  day   35-07        -0.203   -0.379   -0.0265 0.0273   0.0273  
## 4 bin.316  day   35-07        -0.199   -0.324   -0.0747 0.00407  0.00407 
## 5 bin.231  day   35-07        -0.170   -0.322   -0.0187 0.0317   0.0317

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: iRep ~ day + (1 | genome)
##    Data: ctrl735_dat
## 
## REML criterion at convergence: -12.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2514 -0.4663 -0.1007  0.3758  8.2607 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  genome   (Intercept) 0.06485  0.2547  
##  Residual             0.04246  0.2061  
## Number of obs: 201, groups:  genome, 15
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   1.92960    0.06918  15.56667  27.891 1.03e-14 ***
## day35        -0.17779    0.02946 185.48691  -6.035 8.45e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr)
## day35 -0.224

## # A tibble: 3 x 8
## # Groups:   genome [3]
##   genome  term  comparison estimate conf.low conf.high tuk_pval fdr.pval
##   <chr>   <chr> <chr>         <dbl>    <dbl>     <dbl>    <dbl>    <dbl>
## 1 bin.493 day   35-07        -0.234   -0.374   -0.0931  0.00567  0.00567
## 2 bin.735 day   35-07        -0.193   -0.304   -0.0833  0.00325  0.00325
## 3 bin.316 day   35-07        -0.220   -0.375   -0.0646  0.00979  0.00979

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: iRep ~ day + (1 | genome)
##    Data: sub735_dat
## 
## REML criterion at convergence: -80
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.63109 -0.58720 -0.02263  0.68275  2.95601 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  genome   (Intercept) 0.04323  0.2079  
##  Residual             0.01746  0.1321  
## Number of obs: 94, groups:  genome, 8
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  1.82949    0.07612  7.58090  24.033 1.97e-08 ***
## day35       -0.10652    0.02806 85.39590  -3.796 0.000274 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr)
## day35 -0.185

## # A tibble: 0 x 8
## # Groups:   genome [0]
## # … with 8 variables: genome <chr>, term <chr>, comparison <chr>,
## #   estimate <dbl>, conf.low <dbl>, conf.high <dbl>, tuk_pval <dbl>,
## #   fdr.pval <dbl>

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: iRep ~ day + (1 | genome)
##    Data: ther735_dat
## 
## REML criterion at convergence: -111.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.9844 -0.5342 -0.1127  0.4230  4.7538 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  genome   (Intercept) 0.03131  0.1769  
##  Residual             0.01828  0.1352  
## Number of obs: 131, groups:  genome, 11
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   1.67635    0.05592  10.98071  29.977 6.96e-12 ***
## day35         0.02964    0.02387 119.28428   1.242    0.217    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr)
## day35 -0.208

4 Conclusions

  • More work is needed on the statistical analysis side of things.
    -seems like mixed models could be useful.
  • Some evidence that bacterial growth rates are reduced in the antibiotic treated birds
    -seems to be limited to the early timepoint.
  • Some evidence that bacterial growth rates are higher early on (D7).
    -this effect is not seen in the therapeutic treatment group.
  • Future studies seeking to use this method should carefully consider experimental design
    -try to maximize the overlap in bacterial community per group
    -cant compare growth rates if microbes only exist in one treatment.
    -perhaps sample very soon after the initiation of treatment.
    -before community has time to completely change.